Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking

نویسنده

  • Pasi Pertilä
چکیده

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work proposes an online method for speaker detection, speaker direction tracking, and speech separation. The separation is based on multiple acoustic source tracking (MAST) using Bayesian filtering and time-frequency masking. Measurements from three room environments with varying amounts of reverberation using two different designs of microphone arrays are used to evaluate the capability of the method to separate up to four simultaneously active speakers. Separation of moving talkers is also considered. Results are compared to two reference methods: ideal binary masking (IBM) and oracle tracking (O-T). Simulations are used to evaluate the effect of number of microphones and their spacing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modulation domain blind source separation for noisy speech mixture

In this paper, we propose a noise-robust blind speech separation (BSS) method by using two microphones. We first use modulation domain real and imaginary spectral subtraction (MRISS) to enhance both magnitude and phase spectra of the speech mixture inputs. We then estimate the direction of arrivals (DOAs) of the speech sources and perform time-acoustic-modulation frequency masking to recover th...

متن کامل

Evaluations on underdetermined blind source separation in adverse environments using time-frequency masking

The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-...

متن کامل

Continuous time-frequency masking method for blind speech separation with adaptive choice of threshold parameter using ICA

We propose a novel method for blind speech separation using continuous time-frequency masking. The method is equipped with an adaptive choice of a threshold parameter that is based on utilization of ICA methods. We present a direct application that consists in the speech segregation for automatic transcription of spoken broadcasts disturbed by background music. Experimental results show improve...

متن کامل

Towards single-channel unsupervised source separation of speech mixtures: the layered harmonics/formants separation-tracking model

Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of states to capture source spectral variation, and trained on large amounts of isolated speech. Since observations can be similar between sources, inference relies on sequential constraints from the state transition matrix which are, however, quite weak. To avoid these problems, we propose a strat...

متن کامل

Blind Signal Separation and Speech Recognition in the Frequency Domain

In this paper it is shown that a Blind Signal Separation (BSS) method in the frequency domain (FDBSS) improves significantly the speaker Signal to Interference Ratio (SIR) and the phoneme recognition score of a continuous speech, speaker-independent acoustic decoder in a multi-simultaneous-speaker office environment. Specifically, the efficiency of the presented FDBSS method is studied on a TIT...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2013